Goto

Collaborating Authors

 measure fairness


Measuring and signing fairness as performance under multiple stakeholder distributions

arXiv.org Artificial Intelligence

As learning machines increase their influence on decisions concerning human lives, analyzing their fairness properties becomes a subject of central importance. Yet, our best tools for measuring the fairness of learning systems are rigid fairness metrics encapsulated as mathematical one-liners, offer limited power to the stakeholders involved in the prediction task, and are easy to manipulate when we exhort excessive pressure to optimize them. To advance these issues, we propose to shift focus from shaping fairness metrics to curating the distributions of examples under which these are computed. In particular, we posit that every claim about fairness should be immediately followed by the tagline "Fair under what examples, and collected by whom?". By highlighting connections to the literature in domain generalization, we propose to measure fairness as the ability of the system to generalize under multiple stress tests -- distributions of examples with social relevance. We encourage each stakeholder to curate one or multiple stress tests containing examples reflecting their (possibly conflicting) interests. The machine passes or fails each stress test by falling short of or exceeding a pre-defined metric value. The test results involve all stakeholders in a discussion about how to improve the learning system, and provide flexible assessments of fairness dependent on context and based on interpretable data. We provide full implementation guidelines for stress testing, illustrate both the benefits and shortcomings of this framework, and introduce a cryptographic scheme to enable a degree of prediction accountability from system providers.


How to measure fairness when an algorithm decides

#artificialintelligence

Companies and governments delegating or supporting decisions in machine learning algorithms provoke concern and even opposition. This is because high-stakes decisions are being automated and there is evidence that algorithms can replicate or amplify existing biases. The problem is that these issues are not fully resolved even for when decisions are made by people, so there are no general criteria that can be clearly transferred to an algorithm. For example, when it comes to promoting gender fairness in recruitment, should men and women have the same opportunity, and should competences determine who gets the position? Or should you fill a vacancy to maintain parity or a quota, even if it involves ignoring more capable candidates? Issues like these always arise when trying to ensure fairness, or avoid discrimination, in any aspect of the human condition where there are illegitimate differences or when there are vulnerable groups.


Machine Learning Model Fairness in Practice

#artificialintelligence

In the last few years, the interest around fairness in machine learning has been gaining a lot of momentum. Rightfully so: our models are becoming more and more prevalent in our daily lives, and their impact on the society at large is rapidly increasing. I believe that today more than ever, it is crucial to make sure that the models we develop treat us, humans, fairly. Taken from Moritz Hardt lecture notes. In this blog post I will try and answer those questions! There are many ways to measure fairness and it varies from problem to problem and human to human.